Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for the MMLU Pro benchmark, a multiple-choice question answering task from the TIGER-Lab/MMLU-Pro dataset.
- Introduces a new MMLU Pro task configuration
- Implements a custom prompt function for MMLU Pro questions
- Configures evaluation on the test split with validation for few-shots
Comments suppressed due to low confidence (8)
src/lighteval/tasks/tasks/mmlu_pro.py:74
- The task configuration is missing the
generation_sizeparameter, which is required for generative metrics likegpqa_instruct_metric. Based on similar tasks using this metric (e.g., gpqa.py lines 57, 73, 89), a value likegeneration_size=30orgeneration_size=32768should be specified depending on whether reasoning traces are expected.
src/lighteval/tasks/tasks/mmlu_pro.py:74 - The task configuration is missing the
stop_sequenceparameter. Based on the generative nature of the task and similar configurations (e.g., gpqa.py lines 59, 75, 91),stop_sequence=[]should be explicitly set to use the EOS token.
src/lighteval/tasks/tasks/mmlu_pro.py:23 - Import of 'LogLikelihoodAccMetric' is not used.
https://arxiv.org/abs/2406.01574
"""
from string import ascii_uppercase
src/lighteval/tasks/tasks/mmlu_pro.py:25
- Import of 'LogProbCharNorm' is not used.
Import of 'LogProbPMINorm' is not used.
Import of 'LogProbTokenNorm' is not used.
from lighteval.metrics.metrics import Metrics
src/lighteval/tasks/tasks/mmlu_pro.py:27
- Import of 'get_metrics_for_formulation' is not used.
from lighteval.tasks.requests import Doc
src/lighteval/tasks/tasks/mmlu_pro.py:29
- Import of 'get_mcq_prompt_function' is not used.
src/lighteval/tasks/tasks/mmlu_pro.py:34 - Import of 'CFFormulation' is not used.
Import of 'HybridFormulation' is not used.
Import of 'MCFFormulation' is not used.
TEMPLATE = """
Answer the following multiple choice question. The last line of your response should be of the following format: 'Answer: $LETTER' (without quotes) where LETTER is one of ABCD. Think step by step before answering.
{question}
src/lighteval/tasks/tasks/mmlu_pro.py:35
- Import of 'Language' is not used.
{choices}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Member
|
Thanks @NathanHB, works perfectly! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
to run: